Oozie Shell Action Extension
Shell Action
Shell Action Logging
Shell Action Limitations
Appendix, Shell XML-Schema
AE.A Appendix A, Shell XML-Schema
Shell Action Schema Version 0.2
Shell Action Schema Version 0.1
首先配置job-tracker
, name-node
, Shell exec
元素,必须配置参数和配置.
两个必要条件:1.输出格式必须是一个有效Java属性文件.2.输出大小在2KB之内
语法:
<workflow-app name="[WF-DEF-NAME]" xmlns="uri:oozie:workflow:0.3">
...
<action name="[NODE-NAME]">
<shell xmlns="uri:oozie:shell-action:0.1">
<job-tracker>[JOB-TRACKER]</job-tracker>
<name-node>[NAME-NODE]</name-node>
<prepare>
<delete path="[PATH]"/>
...
<mkdir path="[PATH]"/>
...
</prepare>
<job-xml>[SHELL SETTINGS FILE]</job-xml>
<configuration>
<property>
<name>[PROPERTY-NAME]</name>
<value>[PROPERTY-VALUE]</value>
</property>
...
</configuration>
<exec>[SHELL-COMMAND]</exec>
<argument>[ARG-VALUE]</argument>
...
<argument>[ARG-VALUE]</argument>
<env-var>[VAR1=VALUE1]</env-var>
...
<env-var>[VARN=VALUEN]</env-var>
<file>[FILE-PATH]</file>
...
<archive>[FILE-PATH]</archive>
...
<capture-output/>
</shell>
<ok to="[NODE-NAME]"/>
<error to="[NODE-NAME]"/>
</action>
...
</workflow-app>
prepare
:启动作业前删除或创建的路径列表,指定的路径必须以 hdfs://HOST:PORT
开头 .
job-xml
:包含Shell作业的配置文件,在schema 0.2
,可以允许多个job-xml
元素来指定多个job.xml
文件.
configuration
:包含的配置属性将被提交到Shell作业中.
exec
包含Shell命令执行路径,Shell命令将使用一个或多个argument
元素.
argument
包含将要提交的参数.
env-var
包含环境变量并提交到Shell命令中.env-var
是一个环境变量键值对. If the pair contains the variable such as $PATH, it should follow the Unix convention such as PATH=$PATH:mypath. 不要使用 ${PATH} ,这样将被Oozie El evaluator替换.
shell
:节点创建一个Hadoop配置,这个配置作为一个有效本地文件在它运行的目录下. The exact file path is exposed to the spawned shell using the environment variable called OOZIE_ACTION_CONF_XML .The Shell application can access the environemnt variable to read the action configuration XML file path.
capture-output
:捕捉shell命令执行的STDOUT为输出. The Shell command output必须在Java 配置文件格式并且不能被执行超过2KB.在WF定义内,shell活动节点的输出可以通过String action:output(String node, String key)
函数来访问(Refer to section '4.2.6 Action EL Functions').
Example:
使用任何shell/perl 脚本或CPP executable
<workflow-app xmlns='uri:oozie:workflow:0.3' name='shell-wf'>
<start to='shell1' />
<action name='shell1'>
<shell xmlns="uri:oozie:shell-action:0.1">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<configuration>
<property>
<name>mapred.job.queue.name</name>
<value>${queueName}</value>
</property>
</configuration>
<exec>${EXEC}</exec>
<argument>A</argument>
<argument>B</argument>
<file>${EXEC}#${EXEC}</file> <!--Copy the executable to compute node's current working directory -->
</shell>
<ok to="end" />
<error to="fail" />
</action>
<kill name="fail">
<message>Script failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<end name='end' />
</workflow-app>
提交Oozie作业的相应的作业属性文件如下:
oozie.wf.application.path=hdfs://localhost:8020/user/kamrul/workflows/script#Execute is expected to be in the Workflow directory.
#Shell Script to run
EXEC=script.sh
#CPP executable. Executable should be binary compatible to the compute node OS.
#EXEC=hello
#Perl script
#EXEC=script.pl
jobTracker=localhost:8021
nameNode=hdfs://localhost:8020
queueName=default
How to run any java program bundles in a jar.
<workflow-app xmlns='uri:oozie:workflow:0.3' name='shell-wf'>
<start to='shell1' />
<action name='shell1'>
<shell xmlns="uri:oozie:shell-action:0.1">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<configuration>
<property>
<name>mapred.job.queue.name</name>
<value>${queueName}</value>
</property>
</configuration>
<exec>java</exec>
<argument>-classpath</argument>
<argument>./${EXEC}:$CLASSPATH</argument>
<argument>Hello</argument>
<file>${EXEC}#${EXEC}</file> <!--Copy the jar to compute node current working directory -->
</shell>
<ok to="end" />
<error to="fail" />
</action>
<kill name="fail">
<message>Script failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<end name='end' />
</workflow-app>
The corresponding job properties file used to submit Oozie job could be as follows:
提交Oozie作业的相应的作业属性文件如下:
oozie.wf.application.path=hdfs://localhost:8020/user/kamrul/workflows/script#Hello.jar file is expected to be in the Workflow directory.
EXEC=Hello.jar
jobTracker=localhost:8021
nameNode=hdfs://localhost:8020
queueName=default
Shell Action Limitations
Although Shell action can execute any shell command, there are some limitations.
- No interactive command is supported.
- Command can't be executed as different user using sudo.
- User has to explicitly upload the required 3rd party packages (such as jar, so lib, executable etc). Oozie provides a way using and tag through Hadoop's Distributed Cache to upload.
- Since Oozie will execute the shell command into a Hadoop compute node, the default installation of utility in the compute node might not be fixed. However, the most common unix utilities are usually installed on all compute nodes. It is important to note that Oozie could only support the commands that are installed into the compute nodes or that are uploaded through Distributed Cache.
Shell Action Schema Version 0.2
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:shell="uri:oozie:shell-action:0.2" elementFormDefault="qualified"
targetNamespace="uri:oozie:shell-action:0.2"> <xs:element name="shell" type="shell:ACTION"/>
<xs:complexType name="ACTION">
<xs:sequence>
<xs:element name="job-tracker" type="xs:string" minOccurs="1" maxOccurs="1"/>
<xs:element name="name-node" type="xs:string" minOccurs="1" maxOccurs="1"/>
<xs:element name="prepare" type="shell:PREPARE" minOccurs="0" maxOccurs="1"/>
<xs:element name="job-xml" type="xs:string" minOccurs="0" maxOccurs="unbounded"/>
<xs:element name="configuration" type="shell:CONFIGURATION" minOccurs="0" maxOccurs="1"/>
<xs:element name="exec" type="xs:string" minOccurs="1" maxOccurs="1"/>
<xs:element name="argument" type="xs:string" minOccurs="0" maxOccurs="unbounded"/>
<xs:element name="env-var" type="xs:string" minOccurs="0" maxOccurs="unbounded"/>
<xs:element name="file" type="xs:string" minOccurs="0" maxOccurs="unbounded"/>
<xs:element name="archive" type="xs:string" minOccurs="0" maxOccurs="unbounded"/>
<xs:element name="capture-output" type="shell:FLAG" minOccurs="0" maxOccurs="1"/>
</xs:sequence>
</xs:complexType>
<xs:complexType name="FLAG"/>
<xs:complexType name="CONFIGURATION">
<xs:sequence>
<xs:element name="property" minOccurs="1" maxOccurs="unbounded">
<xs:complexType>
<xs:sequence>
<xs:element name="name" minOccurs="1" maxOccurs="1" type="xs:string"/>
<xs:element name="value" minOccurs="1" maxOccurs="1" type="xs:string"/>
<xs:element name="description" minOccurs="0" maxOccurs="1" type="xs:string"/>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:sequence>
</xs:complexType>
<xs:complexType name="PREPARE">
<xs:sequence>
<xs:element name="delete" type="shell:DELETE" minOccurs="0" maxOccurs="unbounded"/>
<xs:element name="mkdir" type="shell:MKDIR" minOccurs="0" maxOccurs="unbounded"/>
</xs:sequence>
</xs:complexType>
<xs:complexType name="DELETE">
<xs:attribute name="path" type="xs:string" use="required"/>
</xs:complexType>
<xs:complexType name="MKDIR">
<xs:attribute name="path" type="xs:string" use="required"/>
</xs:complexType>
</xs:schema>
Shell Action Schema Version 0.1
<?xml version="1.0" encoding="UTF-8"?>
<!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:shell="uri:oozie:shell-action:0.1" elementFormDefault="qualified"
targetNamespace="uri:oozie:shell-action:0.1">
<xs:element name="shell" type="shell:ACTION"/>
<xs:complexType name="ACTION">
<xs:sequence>
<xs:element name="job-tracker" type="xs:string" minOccurs="1" maxOccurs="1"/>
<xs:element name="name-node" type="xs:string" minOccurs="1" maxOccurs="1"/>
<xs:element name="prepare" type="shell:PREPARE" minOccurs="0" maxOccurs="1"/>
<xs:element name="job-xml" type="xs:string" minOccurs="0" maxOccurs="1"/>
<xs:element name="configuration" type="shell:CONFIGURATION" minOccurs="0" maxOccurs="1"/>
<xs:element name="exec" type="xs:string" minOccurs="1" maxOccurs="1"/>
<xs:element name="argument" type="xs:string" minOccurs="0" maxOccurs="unbounded"/>
<xs:element name="env-var" type="xs:string" minOccurs="0" maxOccurs="unbounded"/>
<xs:element name="file" type="xs:string" minOccurs="0" maxOccurs="unbounded"/>
<xs:element name="archive" type="xs:string" minOccurs="0" maxOccurs="unbounded"/>
<xs:element name="capture-output" type="shell:FLAG" minOccurs="0" maxOccurs="1"/>
</xs:sequence>
</xs:complexType>
<xs:complexType name="FLAG"/>
<xs:complexType name="CONFIGURATION">
<xs:sequence>
<xs:element name="property" minOccurs="1" maxOccurs="unbounded">
<xs:complexType>
<xs:sequence>
<xs:element name="name" minOccurs="1" maxOccurs="1" type="xs:string"/>
<xs:element name="value" minOccurs="1" maxOccurs="1" type="xs:string"/>
<xs:element name="description" minOccurs="0" maxOccurs="1" type="xs:string"/>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:sequence>
</xs:complexType>
<xs:complexType name="PREPARE">
<xs:sequence>
<xs:element name="delete" type="shell:DELETE" minOccurs="0" maxOccurs="unbounded"/>
<xs:element name="mkdir" type="shell:MKDIR" minOccurs="0" maxOccurs="unbounded"/>
</xs:sequence>
</xs:complexType>
<xs:complexType name="DELETE">
<xs:attribute name="path" type="xs:string" use="required"/>
</xs:complexType>
<xs:complexType name="MKDIR">
<xs:attribute name="path" type="xs:string" use="required"/>
</xs:complexType>
</xs:schema>